17 - Deep Learning - Regularization Part 1 [ID:16888]
50 von 114 angezeigt

Welcome back to deep learning. So today we want to talk about regularization

techniques and we'll start with a short introduction to regularization and the

general problems of overfitting. So you can see here that we will first start

about the background, what is the problem of regularization, then we talk

about classical techniques, normalization, dropout, initialization, transfer

learning is a very common one, and multitask learning. So why are we talking

about this topic so much? Well, if you want to fit your data, then problems like

these ones, they're easy to fit because they have a clear solution. But

typically you have the problem that your data is noisy and you cannot easily

separate them. So what you then run into is the problem of underfitting. If you

have a model that doesn't have a very high capacity, then you may have

something like this line here, which is not a very good fit to describe the

separation of the classes. The contrary is overfitting. So here we have models

with very high capacity. Now these high capacity models try to model everything

that they observe in the training data and this may yield decision boundaries

that are not very reasonable. What we are actually interested in is a sensible

boundary that is somehow a compromise between the observed data and the actual

ground truth representation. So we can analyze this problem by the so-called

bias-variance decomposition and here we stick to regression where we have an

ideal function h and this has some value and it's typically associated with some

measurement noise. So there's some additional value epsilon that is added

to h of x and this then may be distributed normally with a zero mean

and a standard deviation of sigma epsilon. Now you can go ahead and use a

model to estimate h and this is now f hat that is estimated from some data set d

and we can now express the loss for a single point as the expected value of

the loss and here this would then simply be the L2 loss. So we take the true

function minus the estimated function to the power of 2 and compute the expected

value to yield this loss. Interestingly this loss can be shown to be

decomposable into two parts. So there is the bias and the bias is essentially the

deviation of the expected value of our model from the true model. So this

essentially measures how far we are off. The other part can be explained by the

limited size of the data set. So we can always try to find a model that is very

flexible and tries to reduce this bias and what we buy in what we get as a

result is an increase in variance. So the variance is the expected value of y hat

minus the current value of y hat to the power of 2 and there of the expected

value. So this is nothing else than the variance that we encounter in y hat and

then of course there is a small irreproducible error. Now we can

integrate this over every data point in x and we get the entire loss for the

entire data set over our loss for the single point. By the way a similar

decomposition exists for classification using the 1-0 loss which you can see in

reference 9. It's slightly different but it has similar implications. So we learn

that with an increase in variance we can essentially reduce the bias the

prediction error of our model on the training data set. Let's visualize this a

bit. So on the top left we see a low bias low variance model. This is

essentially always right and doesn't have a lot of noise in the predictions.

The top right we see a high bias model that is very consistent so no variance

but it's consistently off. In the bottom left we see a low bias high variance

model so this has a considerable degree of variation but on average it's very

close to where it's supposed to be and on the bottom right we have the case

that we want to omit. This is a high bias high variance model which has a lot of

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:10:35 Min

Aufnahmedatum

2020-05-30

Hochgeladen am

2020-05-31 00:46:40

Sprache

en-US

Deep Learning - Regularization Part 1

This video discusses the problem of over- and underfitting. In order to get a better understanding, we explore the bias-variance trade-off and look into the effects of training data size and number of parameters.

Further Reading:
A gentle Introduction to Deep Learning

Tags

Perceptron Introduction artificial intelligence deep learning machine learning pattern recognition
Einbetten
Wordpress FAU Plugin
iFrame
Teilen